Intel’s P6 Uses Decoupled Superscalar Design: 2/16/95

نویسنده

  • Linley Gwennap
چکیده

Intel’s forthcoming P6 processor (see cover story) is designed to outperform all other x86 CPUs by a significant margin. Although it shares some design techniques with competitors such as AMD’s K5, NexGen’s Nx586, and Cyrix’s M1, the new Intel chip has several important advantages over these competitors. The P6’s deep pipeline eliminates the cache-access bottlenecks that restrict its competitors to clock speeds of about 100 MHz. The new CPU is designed to run at 133 MHz in its initial 0.5-micron BiCMOS implementation; a 0.35-micron version, due next year, could push the speed as high as 200 MHz. In addition, the Intel design uses a closely coupled secondary cache to speed memory accesses, a critical issue for high-frequency CPUs. Intel will combine the P6 CPU and a 256K cache chip into a single PGA package, reducing the time needed for data to move from the cache to the processor. Like some of its competitors, the P6 translates x86 instructions into simple, fixed-length instructions that Intel calls micro-operations or uops (pronounced “youops”). These uops are then executed in a decoupled superscalar core capable of register renaming and out-oforder execution. Intel has given the name “dynamic execution” to this particular combination of features, which is neither new nor unique, but highly effective in increasing x86 performance. The P6 also implements a new system bus with increased bandwidth compared to the Pentium bus. The new bus is capable of supporting up to four P6 processors with no glue logic, reducing the cost of developing and building multiprocessor systems. This feature set makes the new processor particularly attractive for servers; it will also be used in high-end desktop PCs and, eventually, in mainstream PC products.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Algorithm Improves Branch Prediction: 3/27/95

Intel’s P6 processor (see 090202.PDF) is the first to use a two-level branch-prediction algorithm to improve accuracy. This algorithm, first published by Tse-Yu Yeh and Yale Patt, has the potential to push accuracy well beyond the 90% level achieved by the best processors today. As future processors look to improve performance by increasing the issue rate and/or extending the pipeline depth, th...

متن کامل

K7 Challenges Intel: 10/26/98

Tired of eating Intel’s performance dust, AMD is preparing a new entry in the battle for the CPU socket in high-end PCs. At the Microprocessor Forum earlier this month, chief architect Dirk Meyer described AMD’s nextgeneration K7 processor, on which the company will place its hopes for higher ASPs and improved profitability. The new processor was jointly developed by Meyer’s team in Austin (Tex...

متن کامل

K7 Challenges Intel: 10/26/98

Tired of eating Intel’s performance dust, AMD is preparing a new entry in the battle for the CPU socket in high-end PCs. At the Microprocessor Forum earlier this month, chief architect Dirk Meyer described AMD’s nextgeneration K7 processor, on which the company will place its hopes for higher ASPs and improved profitability. The new processor was jointly developed by Meyer’s team in Austin (Tex...

متن کامل

The limits of a decoupled out-of-order superscalar architecture

This thesis presents a study into a technique for improving performance in outof-order superscalar architectures. It identifies three technological trends limiting superscalar performance; they are the increasing cost of a main memory access, control dependencies and the greater hardware complexity of out-of-order execution. Decoupling is a technique that can provide higher performance through ...

متن کامل

A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995